Learn to Track: From Images to 3D Data
نویسنده
چکیده
Object pose estimation aims at determining the relation between the camera and the object of interest. For rigid objects, the pose is defined through three degrees of freedom for rotation and three for translation. Across different applications in robotic perception, augmented reality and human-computer interaction, achieving a real-time robust object pose estimation is an integral component to reach their goals. This thesis focuses on object pose estimation using a temporal tracker from 3D data. The objective is not only to achieve robustness through public benchmarks, but also to integrate into a wide-range of real-world applications. A temporal tracker is a frame-to-frame algorithm that relays the object’s pose from the previous frame and updates it for the current frame. When we investigate the use case of temporal trackers, they are frequently applied on reconstruction frameworks to track a single object or a static scene. However, when tracking several independently moving objects present in the scene, most frameworks rely on tracking-by-detection. Conversely, tracking-by-detection methods look at each frame independently. For each frame, they localize the object on the image and estimate the pose. On one hand, the limitation of a temporal tracker is the requirement to initialize on the first frame either manually or by a detector. On the other, the benefit of a temporal tracker is its efficiency because it simplifies the problem to updating the pose between two consecutive frames. Intuitively, the combination of both approaches, where the temporal tracker is initialized by the detector and then temporally track the objects, is a natural procedure. But, to be robust against clutter, occlusions and fast movements, most temporal trackers require at least 100 ms per frame with a GPU optimization. Since independent temporal trackers are triggered for each object in the scene, it also scales linearly to the number of objects. In effect, tracking-by-detection approaches overcome the efficiency of the temporal trackers with about five objects in the scene. As a consequence, the temporal tracker is not commonly used because it cannot uphold to its promised efficiency. Keeping this in mind, we formulate a learning-based 3D temporal tracker that learns random forests solely from a 3D CAD model to predict the transformation update from the input depth images. On average, it runs at less than 2 ms per frame for each object on a single CPU core. Hence, even if the computational complexity of the tracker is linear to the number of objects, the extremely low magnitude of its tracking time allows the algorithm to track over a hundred moving objects in a scene at 30 fps with only 8 CPU cores. This is significantly faster than any tracking-by-detection approach. Due to the tracker’s efficiency, we have successfully demonstrated the combination of detection and temporal tracking in a single framework to achieve a seamless performance, i.e. with low latency. Although the first version of the tracker works on specific object instances, meaning that the shape of the object of interest that appears in the scene has to match the geometry of the learned CAD model, we further developed an extension that generalizes the tracker to perform pose estimation for any object within a class. Through the 3D head pose estimation, this allows the generalized tracker to directly estimate the pose of an arbitrary user without any a priori information about them. Nevertheless, tracking with an object instance still attains better accuracy than a generalized tracker. Due to this, we introduce a fast and reliable calibration procedure that optimizes the shape model of a specific object from a given class. In this thesis, to
منابع مشابه
Tracking Medical 3d Data with a Parametric Deformable Model
This paper describes a method to t and track surfaces. We suppose that we have already extracted from a 3-D image some data deened by a set of points. The tting step makes use of a superquadric model and solves an inverse problem for free-form deformations. This is then used for tracking surfaces in a time sequence of 3D images. We present diierent approaches to track surfaces in a sequence of ...
متن کاملPharyngeal Airway: An Analysis Using 2D vs. 3D Images in Different Malocclusions
Introdouction: The aim of this study was to compare information regarding pharyngeal airway sizes in adolescent subjects with different malocclusion classes obtained from lateral cephalograms and 3–dimensional (3D) cone-beam computed tomography (CBCT) scans. Materials and methods: In this prospective cross-sectional study, CBCT scans and lateral cephalograms of 35 subjects, taken with...
متن کاملIntegrating fMRI data into 3D conventional radiotherapy treatmentplanning of brain tumors
Introduction: This study was aimed to investigate the beneficial effects of functional magnetic resonance imaging (fMRI) data in treatment planning for patients with CNS tumors in order to decrease the injury of functional regions of the brain followed by increase in life quality and survival of patients. This study pursues a novel approach in planning for the treatment of brai...
متن کاملA Dual-Source Approach for 3D Human Pose Estimation from a Single Image
In this work we address the challenging problem of 3D human pose estimation from single images. Recent approaches learn deep neural networks to regress 3D pose directly from images. One major challenge for such methods, however, is the collection of training data. Specifically, collecting large amounts of training data containing unconstrained images annotated with accurate 3D poses is infeasib...
متن کاملHand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملLearning to Associate Faces across Views in Vector Space of Similarities to Prototypes
We present a method for learning appearance models that can be used to recognise and track both 3D head pose and identities of novel subjects with continuous head movement across the view-sphere. We describe an automatic face data acquisition system based on a magnetic sensor and a calibrated camera. The system enabled us to obtain systematically a database of face images with labelled 3D poses...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017